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This  final  progress  report  suimnaries  the  results  of  a  study  that  successfully  ^ 
determined  the  functional  architecture  of  visual  motion  perception  in  the  sense 
of  defining  the  mechanisms  involved  and  the  relations  between  them.  It  was 
proved  that  visual  motion  is  computed  by  two  neural  systems:  ISrimitive  mention-  ^ 
energy  extraction  (e.g.,  Reichardt  detector)  and  higher-level  feature  tracking. 

A  psychof)hysical  pedestal  paradigm  was  used  to  exclude  the  feature-tracking 
process  and  therfeby  to  obtain  pure  measures  of  motion-energy  extraction.  Motion 
energy  extraction  was  found  to  be  exclusively  monocular,  fast  (cutoff  frequency 
is  12  Hz)  and  sensitive  (can  utilize  0.2%  contrast),  "bottom-up",  and  to  operate 
on  both  luminance  (first-order)  and  contrast  (second-order)  motion  stimuli. 

Motion  feature  tracking  was  found  to  operate  Interocularly  as  well  as  monocularly, 
have  a  cutoff  frequency  of  3  Hz,  and  to  be  both  bottom  up  (it  computes  motion 
from  luminance,  contrast,  depth,  mofeibnr-motion,  flicker  and  other  type  of 
stimuli)  and  top-down  (e.g.,  attentional  states  influence  what  appears  to 
move).  The  full  report  is  appended. 
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ABSTRACT 

This  final  progress  report  summaries  the  results  of  a  study  that  successfully 
determined  the  functional  architecture  of  visual  motion  perception  in  the  sense 
of  defining  the  mechanisms  involved  and  the  relations  between  them.  It  was 
proved  that  visual  motion  is  computed  by  two  neural  systems:  primitive 
motion-energy  extraction  (e.g.,  Reichaidt  detector)  and  higher-level  feature 
tracking.  A  psychophysical  pedestal  paradigm  was  used  to  exclude  the 
feature-tracking  process  and  thereby  to  obtain  pure  measures  of  motion-energy 
extractioa  Motion  energy  extraction  was  found  to  be  exclusively  monocular, 
fast  (cutoff  frequency  is  12  Hz)  and  sensitive  (can  utilize  0.2%  contrast), 
"bottom-iq>'‘,  and  to  operate  on  both  luminance  (first-order)  and  contrast 
(second-order)  motion  stimuli.  Motion  feature  tracking  was  found  to  operate 
interocularly  as  well  as  monoculaily,  have  a  cutoff  frequency  of  3  Hz,  and  to 
be  both  bottom  up  (it  computes  motiem  from  luminance,  contrast,  depth, 
motion-motion,  flicker  and  other  types  of  stimuli)  and  top-down  (e.g., 
attentional  states  influence  what  tqipears  to  move).  The  full  report  is  appended. 
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Project;  The  Functional  Architecture  of  Human  Visual  Motion  Perception 

Historically,  visual  motion  perception  has  been  a  central  problem  in  perceptual  theory.  On 
the  one  hand,  motion  aR)ears  to  involve  an  eariy  stage  of  pattern  recognition  (the  "same"  pattern 
must  be  located  first  here  and  then  there);  on  the  other  hand,  motion  appears  to  invoke  a  unique 
perceptual  experience  quite  different  from  pattern  or  shtqje  perception. 

Almost  from  the  beginning  of  the  experimental  study  of  motion  perception,  it  has  been 
evident  that  more  than  one  kind  of  computation  is  involved,  and  there  has  been  a  plethora  of  dual¬ 
process  and  multi-process  motion  theories.  Some  recent  examples  are  short-  versus  long-range 
motion  (1),  motion-energy  and  Reichardt  detectors  (2)  versus  zero  crossings  (3)  or  gradients  (4), 
first-order  versus  second-order  motion  (5),  and  so  on.  While  there  clearly  is  a  kernel  of  tmth 
underlying  each  of  these  dichotomies,  there  have  been  two  pervasive  problems;  so  far,  there  have 
not  been  operations  that  give  pure  measures  of  each  proposed  mechanism,  nor  has  there  been  a 
clear  distinction  between  the  algorithm  by  which  motion  is  computed  and  fire  preprocessing  of  the 
visual  image  prior  to  the  point  of  motion  computation. 

Here,  we  offer  a  combination  of  two  basic  paradigms  (pedestal  and  interocular  displays)  plus 
several  subsidiary  paradigms  (stimulus  superpositions  with  varying  phases  and  directions,  stimulus 
mixtures,  and  attentional  manipulations)  that  offer  a  clear  indication  of  the  motion  algorithms  and 
yield  surprising  insights  into  the  image  transformations  involved.  The  pedestal  paradigm  will  be 
self-evident,  but  its  ultimate  significance  is  that  it  offers  a  litmus  test  for  a  Reichardt  (or  the 
equivalent  motion-energy)  algorithm,  so  we  briefly  review  these  first. 

Reichardt  (and  Motion  Energy)  Models 

Computational  theories  of  motion  perception  date  from  Reichardt’s  model  for  insect  vision 
(6),  which  was  adt^rted  for  human  perception  by  van  Santen  &  Sperling  (2).  A  Reichardt  Detector 
consists  of  two  mirror-im^e  subunits  (e.g.,  "Left"  and  "Right")  tuned  to  opposite  directions  of 
motion  (Fig.  la).  Subunit  R  multiplies  the  signal  at  spatial  location  A  with  the  delayed  signal  at  a 
rightward  adjacent  spatial  location  B.  Subunit  L  multiplies  signal  at  spatial  location  B  with  the 
delayed  signal  at  spatial  location  A.  The  output  of  each  subunit  is  integrated  for  a  period  of  time 
and  the  direction  of  movement  is  indicated  by  the  sign  of  the  difference  between  the  subunit 
ouftnits  (6).  Subsequent  theories  (7,8)  of  motion  perception  involving,  essentially,  Fourier  analysis 
of  the  x,y,t  motion  stimulus  to  compute  "motion  energy"  were  shown  to  be  computationally 
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equivalent  to  the  Reichardt  model  (8). 


Insert  Figure  1  here. 


Reichardt  motion  analysis  is  most  naturally  applied  directly  to  drifting  modulations  of 
luminance  that  typically  represent  rigidly  moving  objects.  Because  it  can  be  applied  directly  to  a 
luminance  signal,  it  is  called  Ist-order  motion  extraction  (5).  However,  Chubb  and  Sperling  (9) 
demonstrated  clear  motion  perception  in  a  broad  classes  of  drift-balanced  and  microbalanced 
stimuli  that  were  constructed  of  drifting  modulations  of  contrast,  spatial  frequency,  texture  type,  or 
flicker  (see  also  10).  Such  stimuli  are  said  to  activate  2nd-order  motion  mechanisms  (5)  because 
their  motion  is  invisible  to  Reichardt  or  motion-energy  detectors.  Chubb  and  Sperling  (9)  noted 
that  grossly  nonlinear  preprocessing  (e.g.,  absolute  value  or  square-law  rectification)  prior  to  a 
Reichardt  detector  could  expose  the  latent  motion  in  driftbalanced  and  microbalanced  stimuli. 
Because  first-order  motion  can  be  observed  with  smaller-size  stimuli  than  is  the  case  for  second- 
order  motion,  it  is  generally  believed  that  "short-range”  and  "first-order"  are  the  the  co-defining 
characteristics  of  one  of  die  motion  systems.  We  shall  soon  see  the  situation  is  more  complicated 
than  this. 

Van  Santen  &.  Sperling  (2)  proved  two  extremely  useful  properties  of  Reichardt  detectors 
(and  of  the  equivalent  motion-energy  systems):  (1)  Pseudo-linearity:  When  a  stimulus  is  composed 
of  several  component  sine  waves  with  different  temporal  frequencies,  the  detector’s  response  to  the 
sum  is  the  sum  of  die  responses  to  individual  inputs  (pseudo-linearity  because  linearity  holds  only 
for  sines  of  different  temporal  frequencies).  (2)  Static  displays  are  ignored:  The  output  to  any 
sinusoid  of  zero  temporal  frequency  -a  stationary  pattem-is  zero.  From  (1)  and  (2),  it  follows 
that  adding  a  stationary  sine  (temporal  frequency  is  zero,  therefore  output  is  zero)  to  any  moving 
pattern  (moving  means  temporal  frequency  is  nonzero)  does  not  change  the  output  of  a  Reichardt 
detector  to  the  moving  stimulus. 

The  Pedestal  Test 

We  exploit  the  pseudo-linearity  of  Reichardt  detectors  by  creating  compound  stimuli 
consisting  of  a  stationary  sine  (the  pedestal.  Figs,  lb  &  le)  plus  a  linearly-moving  sine  grating  (the 
test.  Figs.  Ic  &  If).  The  peaks  ^  valleys  of  the  compound  stimulus  oscillate  back  and  forth 
(Figs.  Id  &  Ig).  Nevertheless,  the  output  of  a  Reichardt  detector  is  exactly  the  same  for  the 
pedestal-ftest  stimulus  as  for  the  test  alone.  In  practice,  nonlinearities  of  human  vision  before  and 
after  the  movement  computation  require  that  the  amplitudes  of  these  sine  stimuli  be  small  (e.g., 
less  than  about  5  percent  modulation  depth)  in  order  for  the  pedestal  predictions  to  hold  exactly. 
The  question  is:  how  do  human  observers  perceive  the  compound  stimulus?  IX>  they  track  the 
peaks  (which  implies  a  feature  tracking  mechanism)  or  do  they  perceive  the  linear  motion  of  the 
test  stimulus? 

The  pedestal-i^us-test  stimulus  is  defined  in  the  x,t  domaiiL  Consider  die  equivalent 
pedestal-plus-test  defined  in  the  x,y  domain,  in  which  the  pedestal  is  a  vertical  grating  and  the  test 
is  a  slanted  grating.  Such  an  x,y  texture  is  displayed  (at  high  contrast)  in  Fig.  Ig,  and  the  answer 
is  obvious.  Observers  do  not  directly  perceive  the  component  gratings:  they  perceive  primarily 
back-and-forth  oscillation,  and  an  (apparent  amplitude  modula^bn.  !n  thecc  illuslratioiij>.  the 
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Stationary  spatial  sine  wave  pedestal  has  twice  the  amplitude  of  the  linearly  moving  stimulus;  it 
produces  a  back-and-forth  phase  oscillation  equal  to  1/6  of  the  spatial  cycle  (Fig.  Id),  and  this 
phase  oscillation  is  what  is  perceived  (11). 

By  a  pedestalled  stimulus,  we  refer  to  a  pedestal-plus-motion  stimulus  with  a  2:1  pedestal.test 
amplitude  ratio.  We  conducted  formal  experiments,  to  determine  how  pedestalled  tests  are 
perceived  in  the  x,t  motion  domain.  To  reiterate:  the  Reichardt  model  predicts  that  motion 
direction  extraction  should  be  completely  unaffected  by  the  pedestal.  Therefore,  we  urst  determine 
each  subject’s  threshold  amplitude  for  the  discrimination  of  a  leftward  from  a  rightward  moving 
pure  sinewave  grating.  We  then  add  a  pedestal  with  twice  this  measured  threshold  amplitude.  If 
the  judgment  were  based  on  the  output  of  a  Reichardt  detector,  we  expect  the  subject’s  accuracy  of 
left  versus  right  judgments  to  be  the  same  with  and  without  ihe  pedestal.  On  the  other  hand,  if  the 
motion  direction  computation  were  based  on  stimulus  features  (peaks,  valleys,  zero-crossings,  etc), 
the  pedestalled  motion  would  appear  oscillatory,  and  it  would  be  impossible  for  subjects  to  judge 
motion  direction  of  the  test. 

Stimuli 

We  constructed  four  different  types  of  motion  stimuli:  a  luminance  grating  (Fig.  le),  a 
contrast  grating  (Hg.  Ih),  a  depth  grating  without  any  monocular  motion  cue  (Fig. Ik),  and  a 
motion-defined  moving  grating  ^ig.  11).  Except  for  the  luminance  stimulus,  only  the  pattern  of 
modulation,  not  the  stimulus  itself  moves  (either  to  the  left  or  to  the  right). 

Luminance  grating  (12).  The  luminance  stimulus  is  the  sort  of  first-order  motion  stimulus,  a 
rigidly  translating  sinewave  pattern  (Fig.  lb  and  If)  from  which  traditional  motion  psychophysics 
has  evolved. 

Contrast  grating  (13).  The  contrast  grating  is  a  pure  second-order  stimulus:  Its  expected 
luminance  is  the  same  everywhere;  its  motion  cannot  be  determined  by  Reichardt  detectors. 
However,  an  initial  stage  of  ^atial  filtering,  followed  by  a  nonlinearity  such  as  fullwave 
rectification  (e.g.,  absolute  value  or  squaring)  can  expose  the  contrast  grating’s  motion  to  standard 
motion  analysis  (e.g.,  Reichardt  deteaors;  see  Chubb  &  Sperling,  1989b). 

Depth  grating  (14).  The  dynamic  stereo-depfli  grating  is  created  from  stereo  views  of  left- 
arul  right-half  images.  It  tqrpears  in  depth  as  a  surface  whose  distance  from  the  observer  varies,  as 
illustrated.  The  grating  (and  its  depth)  exist  only  as  a  space-varying  correlation  between  the  pixels 
in  the  left-  and  right-eye  images;  each  monocular  image  is  completely  homogeneous  without  any 
hint  of  a  grating,  and  successive  images  are  uncorrelated. 

Motion-defined  motion  (15).  The  motion-defined  grating  consists  of  dots  that  make  step 
jumps  in  successive  frames.  The  proportion  of  upward  versus  downward  jumping  dots  varies 
sinusoidally  from  left-to-right  To  perceive  the  movement  of  the  motion-defined  grating  requires 
(1)  computing  the  direction  of  motion  of  the  dots,  and  (2)  noting  that  the  sinewave  pattern  of  dot- 
motion  moves  with  time.  This  kind  of  ”motion-from-motion"  (16)  seems  to  suggest  a  hierarchical 
organization  of  motion  detectors. 
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Procedure 

Within  an  experimental  session  (17),  only  one  type  of  stimulus  was  presented,  but  various 
temporal  frequencies  were  mixed.  Subjects  initiated  a  trial  by  pressing  a  key.  A  fixation  point 
immediately  appeared  in  the  center  of  the  display  and  remained  on  throughout  the  trial;  0.5  sec 
later,  the  stimulus  was  presented.  The  stimulus  was  always  presented  for  a  full  temporal  cycle 
starting  with  a  random  phase.  To  remove  locational  cues,  the  first  and  last  frames  were  always 
identical.  The  subjects’  task  was  to  indicate  by  a  key  press  which  one  of  two  possible  motion 
directions  was  perceived  and  to  give  a  confidence  rating  ranging  from  0  (totally  uncertain)  to  5 
(absolutely  sure). 

Initially,  subjects’  motion-direction  discrimination  thresholds  were  measured  without 
pedestals.  The  metiiod  of  constant  stimuli  was  used  to  generate  psychometric  functions  for  a  set  of 
temporal  frequencies  for  each  motion  stimulus  type.  At  least  100  observations  were  made  for  each 
subject  at  fte  five  points  (determined  by  preliminary  observations)  that  best  defined  the 
psychometric  function  (probability  correct  versus  modulation  amplitude).  For  a  given  motion 
stimulus  of  type  s  and  temporal  frequency  / ,  we  defined  the  subject’s  threshold  as  the  amplitude 
corresponding  to  the  75%  correct  point  on  the  psychometric  function.  In  the 
subsequent  pedestal  test,  the  amplitude  of  the  motion  stimulus  was  always  set  at  m^s{s,f).  On 
each  trial,  the  modulation  amplitude  of  the  pedestal  was  randomly  set  either  at  0  or  at  ,/). 

Within  a  session,  all  temporal  frequencies  and  pedestal  amplimdes  were  mixed.  We  compared 
every  subject’s  performance  with  and  without  the  pedestal  in  the  same  session.  Two  male  subjects 
with  corrected-to-normal  vision  served  as  subjects  for  the  data  reported  here. 

Results 

Both  subjects  cleariy  perceived  apparent  motion  in  all  the  motion-stimulus-alone  conditions 
when  the  sine  amplitude  was  sufficient,  and  on  the  whole,  produced  quite  similar  data.  The 
quantitative  results  of  one  are  summarized  in  Fig.  Im.  (A)  The  temporal  tuning  functions  for  all 
the  motion  types  show  typical  lowpass  filter  characteristics  (curves  slope  down  to  the  right)  within 
the  temporal  frequency  range  we  tested  (0.94  to  15.0  Hz).  (B)  The  temporal  tuning  functions  can 
be  divided  into  two  groups:  luminance  grating  and  contrast  grating  as  one  group  (upper  curves. 
Fig.  Im),  depth-defin^  grating  and  motion-defined  grating  as  another  group  Gower  set  of  curves). 
Within  each  group,  the  shape  of  the  temporal  tuning  functions  is  remaikably  .similar.  (C)  The 
presence  of  a  2x  pedestal  had  absolutely  no  efiect  on  subjects’  performances  in  the  luminance  and 
contrast  modulation  conditions  (18),  but  it  reduced  performance  to  merely  chance-guessing  levels 
with  the  depth-defined  and  motion-defined  gratings.  For  pedestailed  depth  and  motion-motion 
stimuli,  subjects  reported  that  they  perceived  only  back-and-forth  motion,  and  could  not  judge  the 
direction  of  (patently  invisible)  coherent  motion  (19). 

These  results  cleariy  indicate  that  there  are  two  qualitatively  different  motion  extracticn 
'  mechanisms.  One  mechanism  utilizes  only  the  motion  energy  computation  and  has  a  much  higher 
cutoff  (12  Hz)  in  its  temporal  sensitivity  characteristics.  It  subserves  both  luminance  (first-order) 
and  contrast  (second-order)  stimuli.  Interestingly,  the  contrast-motion  system  has  the  same  temporal 
frequency  characteristics  as  the  luminance-nrotion  system,  despite  frequent  speculation  that  the 
second  order  system  is  "slower'’  than  the  first  order  system  (20).  The  second  mechanism  is  slower, 
but  can  detect  motion  in  stimuli  that  are  invisible  to  the  first  mechanism. 
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Four  Confirming  Procedures 

To  further  understand  these  results,  we  report  briefly  four  subsequent  procedures  (21). 

(A)  We  superimpose  (linearly  add)  pedestailed  luminance  and  contrast  stimuli  moving  in  the 
same  direction  or  in  opposite  directions  (each  with  its  own  pedestal).  Adding  equal  strengtfi 
motion  stimuli  in  opposite  directions  cancels  any  perceived  motion.  The  perceived  motion  strength 
of  stimuli  moving  in  the  same  direction  is  given  by  probability  summation  of  the  strengths  of  the 
two  component  stimuli;  there  is  no  dependence  on  the  relative  phases  of  the  two  stimuli.  If  the 
two  kinds  of  stimuli  were  combined  prior  to  the  motion  computation,  the  sign  of  the  conibination 
would  depend  on  the  relative  phase  of  the  components;  indeed,  stimuli  of  the  same  frequency 
moving  in  the  same  direction  but  with  opposite  phases  could  cancel  all  perceived  motion.  The 
absence  of  any  phase  dependence  means  that  luminance-  and  contrast-motion  strengths  are  first 
computed  separately;  tiien,  the  two  motion  strengths  are  combined. 

(B)  A  pedestailed  stimulus  was  created  with  only  four  firames  per  cycle,  successive  frames 
being  separated  by  90  deg.  Motion  in  this  stimulus  is  perceived  as  well  as  in  a  continuously 
sampled  stimulus  (Fig.  If  versus  Fig.  2).  However,  directing  successive  frames  alternately  into  left 
and  right  eyes  absolutely  destroyed  our  subjects’  ability  to  perceive  the  direction  of  motioa  With 
pedestailed  stimuli,  flte  Erection  of  motion  can  be  computed  only  monocularly,  not  interocularly. 


Insert  Figure  2  here. 


(C)  On  the  other  hand,  presenting  successive  frames  of  the  motion-from-motion  stimulus 
(with  no  pedestal)  to  alternate  eyes  (interocular  presentation.  Fig.  2)  only  slightly  increases 
threshold  for  motion-direction  discrimination  relative  to  a  monocular  presentation.  This  indicates 
that  the  motion-ftom-motion  computation  is  inherently  binocular. 

(D)  Consider  the  display  of  a  luminance  sinusoid  with  successive  frames  separated  by  90 
deg  (Fig.  2).  It  can  be  viewed  either  monocularly  (all  frame  in  same  eye)  or  interocularly 
(successive  frames  in  alternate  eyes).  Converting  from  monocular  to  interocular  presentation  raises 
the  contrast  threshold  (at  low  fluencies)  by  a  factor  of  12  (to  2%)  and  changes  the  frequency 
cutoff  from  12  to  3  Hz,  exactly  like  that  of  the  depth  and  motion-from-motion  stimuli  (Fig.  Im). 
This  shows  that  the  interocular  luminance  grating  is  perceived  by  the  feature-tracking  mechanism; 
this  mechanism  exhibits  exactly  the  same  frequency  cutoff  when  it  detects  motion  luminance 
stimuli  as  it  does  when  it  detects  motion  in  depth  and  motion-from-motion  stimuli. 

A  consequence  of  the  above  is  that  the  motion  of  an  apparently  simple  stimulus,  such  as  a 
drifimg  luminance  grating,  is  computed  by  all  three  systems;  The  monocular  luminance  system, 
which  is  fast  and  sensitive;  the  binocular  feature  tracking  system  which  is  slow  and  less  sensitive; 
and  at  a  double  frequency,  the  monocular  ftillwave-contrast  system  (which  is  relatively  insensitive 
to  this  kind  of  stimulus).  The  drifting  grating,  which  is  regarded  as  a  universal  tool  for  visual 
psychophysics,  turns  out  to  be  not  a  particularly  useful  tool  for  discriminating  among  motion 
mechaiiisms. 
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Selective  attention  affects  the  direction  of  perceived  motion 

Suppose  that  the  binocular  mechanism  were  a  feature-tracking  mechanism.  Then  it  would 
track  vdiatever  features  are  dominant  in  successive  stimuli,  even  when  the  successive  stimuli  are 
composed  of  entirely  different  materials.  We  tested  this  prediction  by  alternating  depth  gratings 
with  texture  gratings.  Frames  1,3,  and  5  consisted  of  a  depth  grating;  frames  2  and  4  consisted  of 
a  texture  grating  (Fig.  3).  In  depth  gratings,  all  subjects  naturally  attend  to  the  near-appearing 
peaks.  In  our  texture  grating,  in  successive  sessions,  subjects  were  asked  to  attend  either  to  the 
right-slanting  higher-spatial  frequency  grating  or  to  the  left-slanting  lower  spatial  frequency  grating. 
The  display  was  arranged  so  that  if  one  texture  feature  were  dominant,  the  display  would  appear  to 
move  to  the  right;  if  the  other  were  dominant,  it  would  appear  to  move  to  the  left.  From  trial  to 
trial,  these  relations  were  reversed  randomly.  If  an  observer  were  to  selectively  track  only  the 
grating  feature,  or  only  the  depth  feature,  the  display  would  be  completely  ambiguous.  To 
perceive  unambiguous  motion,  the  observer  must  bind  two  features,  i.e.,  perceive  movement  from 
the  attended  depth  feature  (near)  to  the  attended  texture  feature  (e.g.,  left-slanted  coarse  texture). 


Insert  Figure  3  here. 


In  formal  experiments,  sequences  of  five  successive  stimuli  (A,B,C,D,A;  Fig.  3)  were 
presented  at  a  frequency  of  2.5  Hz  so  that  an  entire  display  was  completed  in  500  msec.  Two 
subjects  consistently  perceived  motion  in  the  direction  corresponding  to  the  attended  feature  in 
more  than  95%  of  triis  (chance  equals  50%).  Thus,  the  same  stimulus  (Ftg.  3)  was  perceived  as 
moving  to  the  right  when  observers  attended  to  fine  texture  and  as  moving  to  the  left  when  they 
attended  to  the  coarse  texture.  This  indicates  that  not  only  stimulus  properties  but  also  attentiem 
determines  what  features  are  tracked  (22). 

The  influence  on  perceived  motion  by  selective  attention  is  inherently  a  top-down  process. 
Verbal  instructions  to  the  subject  prior  to  the  trial  are  processed  at  a  cognitive  level.  The  output  of 
this  high-level  process  is  used  to  control  a  low-level  filter,  which  controls  the  input  to  the  feature¬ 
tracking  mechanism  by  selectively  admitting  either  coarse  or  fine  textures.  The  low  cutoff 
frequency  of  the  feature-tracking  system,  about  3  Hz,  suggests  that  the  shortest  period  within 
which  attention  can  be  moved  is  about  0.33  sec,  a  period  that  roughly  corresponds  with  the 
dynamics  of  shifts  of  visual  attention  that  are  measured  in  quite  different  paradigms  (23).  These 
results  indicate  fliat  the  feature-tracking  mechanism  is  affected  by  both  top-down  attentional 
processes  and  by  automatic  bottom-up  processes,  each  of  which  contributes  to  the  strength  of  the 
tracked  features. 

Discussion  and  Conclusions 

The  fast,  monocular  system.  The  results  reported  above  are  embodied  in  the  functional 
flowchart  of  Fig.  4.  The  left  side  of  Fig.  4  shows  the  fast  monocular  system:  separate  Reichardt 
detectors  for  inputs  from  each  eye,  and  separate  detectors  for  luminance  (first-order)  and  contrast 
(second-order)  stimuli  within  the  eye.  These  detectors  are  fast  (cutoff  freq  =  12  Hz),  sensitive  (can 
detect  contrasts  of  0.2%),  and  the  outputs  are  combined  after  motion  is  computed.  Survival  of 
pedestal  tests  iiKlicates  the  fast  mech^sms  use  Reichardt  (or  equivalent)  mechanisms  for  all 
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stimuli  being  processed;  failure  to  survive  interocular  presentations  indicates  purely  monocular 
mechanisms. 


Insert  Figure  4  here. 


Feature  tracking.  The  right-hand  side  of  Fig.  4  represents  the  feature-tracking  system.  It  is 
almost  as  efficiently  in  detecting  motion  in  interocuiar  as  in  monocular  displays;  i.e.,  it  is 
inherently  binocular.  Like  the  fast  mechanism,  it  can  compute  motion  of  luminance  and  contrast 
stimuli  (although  in  a  more  restricted  speed  range  and  with  much  less  sensitivity).  It  can  also 
compute  motion  for  stimuli  that  are  invisible  to  the  fast  mechanism  if  they  have  readily  identifiable 
features  (such  as  moving  depth  gratings  and  motion- from-motion  displays).  To  solve  the  motion- 
firom-motion  displays  requires  input  of  the  direction-of-motion  feature  from  the  monocular  system. 
For  all  stimuli,  feature  tracking  exhibits  a  characteristic  3  Hz  cutoff.  Attention  can  determine 
which  of  two  competing  features  is  trackal,  this  top-down  control  of  the  inputs  to  the  feature 
tracking  computation  is  indicated  by  the  arrow  from  Cognitive  Processing  to  a  feature-weighting 
component. 

The  mechanism  of  motion  detection  in  feature  tracking  system  has  not  determined.  This  is 
because  feature  specification  is  inherently  coarsely  quantized  (a  feature  is  either  present  or  absent). 
A  Reichaidt  computation  fails  with  a  coarsely  quantized  pedestailed  stimulus.  Therefore,  we 
cannot  say  whether  the  failure  of  pedestalled  motion  to  survive  interocuiar  manipulations  is  caused 
by  a  Reichardt  mechanism  confronted  with  a  too-coarsel>  quantized  (or  too-noisy)  stimulus  or 
whether  feature  tracking  uses  an  an  entirely  different  algorithm.  From  a  biological  point  of  view, 
it  seems  plausible  that  all  motion  computations  would  use  a  similar  algorithm,  perhaps  embodied  in 
a  common  patch  of  genetic  code,  and  that  only  prior  transformations  and  spatio-temporal 
parameters  would  distinguish  the  two  levels  of  computation. 

Higher  processes,  methodology.  The  current  model  deals  with  motion-direction 
discrimination.  The  outputs,  especially  of  the  fast  first-order  Reichardt  detectors,  have  been 
proposed  as  the  inputs  to  perceptual  processes  that  compute  velocity  (24),  3D  structure  from 
motion  (25),  and  other  useful  prop  rties.  The  pedestal  test  provides  a  useful  means  for 
distinguishing  between  classes  of  motion  extraction  mechanisms,  and  we  anticipate  its  application 
in  these  and  other  stimulus  domains. 
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11.  Zero  crossings,  the  ’jotion  cue  proposed  by  Marr  and  Ullman  (see  reference  3).  follow  the 
same  back-and-forth  path  as  do  peaks  or  valleys. 

12.  Let  Jt  denote  the  horizontal  spatial  coordinate,  y  denote  the  vertical  spatial  coordinate.  The 
luminance  grating  is  defined  as 

Li{x,y,t)=LQ[l  .Ofm  ( 1 ,/  ,)s/«  (2n(ai  jr +/  j  t ))] 

where  L(f=ll5cd/m^,  ai=2.55cpd,  and  fy  =  0.94,  1.88,  3.75,  7.50,  15.0  Hz.  The  grating 
extends  3.13  degrees  horizontally  and  1.57  degrees  vertically. 

13.  The  contrast  modulation  grating  is  defined  as; 

N  d  )=^o[  1+^  (x  .3'  )(0.5+/n  (2,/  i)sin  {2%{cxzx+f  2/ )))] 

where  L(f=l\5cd/m^,  a2=1.28cpd,  fi  -  0.94,  1.88,  3.75,  7.50,  15.0  Hz,  and  R{x,y)  is  a 
random  number  which  assumes  value  +1  and  -1  with  equal  probability.  The  grating  has  the 
same  size  as  the  luminance  grating. 

14.  The  depth  grating  was  made  of  white  (I29.4cd/m^)  random  dots  (0.733'x  1.466')  on  gray 
background  (105. led /m^)  with  horizontal  disparity  between  left  and  right  eyes  defined  as: 

D  {X  ,y  ,t  )=0.733'/m  (m  (3,/  j)sin  (2;c(a3y  +/  3/ ))) 

where  a3=1.28cpd,  fy=  0.94,  1.88,  3.75  Hz.  IntQ  is  a  function  that  takes  real  numbers  as 
input  and  rounds  them  off  to  produce  integer  outputs.  It  is  due  to  the  fact  that  the  pixels  are 
discrete.  The  stimulus  for  each  eye  extends  2.94  degrees  vertically  and  1.47  degrees 
horizontally.  In  each  frame,  there  is  a  40%  probability  for  a  dot  to  be  white  and  no  correlation 
exists  between  frames. 

15.  The  motion  defined  moving  stimulus  was  also  made  of  white  dots  (215.6cd/m^,  5.84'x5.84') 
on  gray  background  (115cd/m^).  The  probabilities  for  the  random  dots  in  a  given  column  to 
move  up  Pu  or  down  P^  are  defined  as; 

Pu  (•*  .3'  4  )=0.5+m  {AJ^sin  (Znia^x+f  4/ ))) 
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Pj{x,y,t)=\.0-Pu 

where  a4=0.32cpd,  f  ^  =  0.94,  1.88,  3.75  Hz.  'fhere  is  a  40%  probability  for  a  dot  to  be  white 
and  a  dot  moves  5.84'  from  one  frame  to  the  next  one.  The  whole  stimulus  extends  7.04 
degrees  horizontally  and  4.69  degrees  vertically. 
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AT-Vista  video  graphics  adapter.  TThe  displays  for  the  experiments  were  presented  on  a  60  Hz 
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frequency,  temporal  frequency,  display  duration  and  phase  randomization  are  exactly  the  same 
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Figure  Captions 

Figure  1.  (a)  Reichandt  motion  detector  (simplified).  A  and  B  indicate  adjacent  locations  of 
visual  receptive  fields,  x  is  a  temporal  delay,  x  indicates  multiplication,  and  indicates 
subtraction.  Outputs  greater  than  zero  indicate  stimulus  motion  from  A  to  B;  outputs  less  than 
zero  indicate  stimuhis  motion  from  B  to  A.  (b)  A  stationary  si.iewave  (the  pedestal),  (c)  A 
moving  sinewave  (the  test  stimulus),  (d)  Pedestailed  motion:  the  sum  of  (b)  and  (c).  The 
pedestal  has  twice  the  amplitude  of  the  motion  stimulus,  which  moves  1/8  of  a  spatial  cycle  to 
the  right  from  one  frame  to  the  next.  The  dotted  line  indicates  that  the  peaks  in  each  frame 
follow  a  zigzag  path  and  no  coherent  "net"  motion  direction  exists  for  any  mechanism  that 
tracks  the  peak  locations.  (e,f,g)  Pedestalled  luminance-motion  (first-order  motion)  (e)  A 
freeze-ftame  snapshot  of  a  luminance  sinewave.  Alternatively,  interpreting  the  vertical  axis  as 
time  (instead  of  vertical  space)  converts  (e)  into  a  space-time  representation  of  a  stationary 
luminance  sinewave,  an  instantiation  of  (b).  (f)  A  space-time  representation  of  a  moving 
luminance  sinewave,  an  instantiation  of  (c).  The  vertical  axis  is  time,  the  horizontal  axis  is 
space,  (g)  The  sum  of  (e)  and  (f),  a  luminance  grating  moving  over  a  pedestal,  an  instantiation 
of  (d).  (h,i,j)  Pedestalled  contrast-modulated  motion  (second-order  motion).  Similar  to  (e,f,g) 
except  instead  of  a  sinusoidal  modulation  of  luminance,  the  contrast  of  a  black-white  noise  field 
is  sinusoidally  modulated,  (k)  Representation  of  the  appearance  of  the  depth  grating;  the  actual 
depth  grating  was  composed  of  random  black-white  noise,  the  depth  resulted  from  stereoscopic 
images.  (1)  Representation  of  the  motion-ftom-motion  stimulus.  The  arrows  indicate  the 
directions  of  motion  of  random  dots;  the  pattern  of  motion  modulation  (up  versus  down)  moves 
either  to  the  left  or  to  the  right  (m)  Experimentally  measured  threshold  modulations  for  correct 
left-right  motion  discrimination  versus  temporal  frequency  (Hz)  of  a  moving  sinusoid.  The  axes 
are  log  scales.  O  indicates  luminance  (Fourier,  FO)  motion  for  either  pedestalled  or 
nonpedestalled  stimuli  (thresholds  are  identical);  A  indicates  contrast  modulated  (fullwave,  FW) 
motion  for  either  pedestalled  or  nonpedestalled  stimuli;  -t-  indicates  simple  (nonpedestalled) 
sinusoidal  depth  (DP)  stimuli;  x  indicates  simple  sinusoidal  motion-from-motion  (MS)  stimuli; 
?  indicates  simple  sinusoidal  interocular  (110  luminance  stimuli.  The  curves  have  been 
vertically  translated  to  expose  their  similarity  in  shape. 

Figure  2.  Schematic  representation  of  interocular  stimulus  presentations.  Stimuli  are  displayed 
only  with  spatial  phase  shifts  of  90  deg  relative  to  the  initial  display,  indicated  by  the  tick  on 
top  of  the  leftmost  peak.  At  each  eye,  the  stimulus  sequence,  indicated  on  the  bottom,  is 
ambiguous  as  to  direction  of  motioa 

Figure  3.  An  display  for  demonstrating  attentional  effects  in  the  perceived  direction  of  motion. 
A  sequence  of  four  consecutive  displays  is  shown;  each  is  displaced  by  90  deg  from  the 
previous  one.  In  the  depth  stimuli,  the  tracked  features  are  always  the  near  peaks  (upper  peaks 
in  the  panels).  When  the  subjea  atteiKis  (tracks)  the  fine  stripes,  the  perceived  direction  of 
motion  between  displays  a-d  is  from  left  to  right;  attending  to  the  coarse  stripes  yields  right-to- 
left  lyxeived  motion.  EXiration  of  the  entire  dispay  is  only  0.50  sec,  too  fast  to  permit  eye 
movements. 
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Figure  4.  Functional  architecture  of  the  visual  motion  system.  The  left-half  represents  the  faM 
monocular  system;  the  right  half  represents  the  feature-tracking  system.  L,  R  indicate  Left  and 
Right  eye  signals,  respectively;  RD  indicates  Reichardt  detector;  TG  indicates  texture  grabber  (a 
spatial  filter  followed  by  fullwave  rectification);  I  iidicates  (possibly  complex)  summation;  x 
represents  multiplication,  the  differential  weighting  of  features  determined  by  attention  (the 
arrow  from  "Cognitive  Processes");  the  central  horizontal  arrow  represents  the  motion-feature 
input  needed  to  solve  motion-from-motion  stimuli. 
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